Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Water Res ; 247: 120804, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-37925861

RESUMO

The world has moved into a new stage of managing the SARS-CoV-2 pandemic with minimal restrictions and reduced testing in the population, leading to reduced genomic surveillance of virus variants in individuals. Wastewater-based epidemiology (WBE) can provide an alternative means of tracking virus variants in the population but decision-makers require confidence that it can be applied to a national scale and is comparable to individual testing data. We analysed 19,911 samples from 524 wastewater sites across England at least twice a week between November 2021 and February 2022, capturing sewage from >70% of the English population. We used amplicon-based sequencing and the phylogeny based de-mixing tool Freyja to estimate SARS-CoV-2 variant frequencies and compared these to the variant dynamics observed in individual testing data from clinical and community settings. We show that wastewater data can reconstruct the spread of the Omicron variant across England since November 2021 in close detail and aligns closely with epidemiological estimates from individual testing data. We also show the temporal and spatial spread of Omicron within London. Our wastewater data further reliably track the transition between Omicron subvariants BA1 and BA2 in February 2022 at regional and national levels. Our demonstration that WBE can track the fast-paced dynamics of SARS-CoV-2 variant frequencies at a national scale and closely match individual testing data in time shows that WBE can reliably fill the monitoring gap left by reduced individual testing in a more affordable way.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Águas Residuárias , Vigilância Epidemiológica Baseada em Águas Residuárias , COVID-19/epidemiologia , Genômica , Inglaterra/epidemiologia
2.
PLoS Biol ; 21(6): e3002121, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37315073

RESUMO

Pluripotency defines the unlimited potential of individual cells of vertebrate embryos, from which all adult somatic cells and germ cells are derived. Understanding how the programming of pluripotency evolved has been obscured in part by a lack of data from lower vertebrates; in model systems such as frogs and zebrafish, the function of the pluripotency genes NANOG and POU5F1 have diverged. Here, we investigated how the axolotl ortholog of NANOG programs pluripotency during development. Axolotl NANOG is absolutely required for gastrulation and germ-layer commitment. We show that in axolotl primitive ectoderm (animal caps; ACs) NANOG and NODAL activity, as well as the epigenetic modifying enzyme DPY30, are required for the mass deposition of H3K4me3 in pluripotent chromatin. We also demonstrate that all 3 protein activities are required for ACs to establish the competency to differentiate toward mesoderm. Our results suggest the ancient function of NANOG may be establishing the competence for lineage differentiation in early cells. These observations provide insights into embryonic development in the tetrapod ancestor from which terrestrial vertebrates evolved.


Assuntos
Proteínas de Homeodomínio , Células-Tronco Pluripotentes , Animais , Proteínas de Homeodomínio/metabolismo , Ambystoma mexicanum/genética , Ambystoma mexicanum/metabolismo , Peixe-Zebra/genética , Diferenciação Celular , Proteína Homeobox Nanog/genética , Proteína Homeobox Nanog/metabolismo , Regulação da Expressão Gênica no Desenvolvimento
3.
Microb Genom ; 9(4)2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37074153

RESUMO

Wastewater-based epidemiology has been used extensively throughout the COVID-19 (coronavirus disease 19) pandemic to detect and monitor the spread and prevalence of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) and its variants. It has proven an excellent, complementary tool to clinical sequencing, supporting the insights gained and helping to make informed public-health decisions. Consequently, many groups globally have developed bioinformatics pipelines to analyse sequencing data from wastewater. Accurate calling of mutations is critical in this process and in the assignment of circulating variants; yet, to date, the performance of variant-calling algorithms in wastewater samples has not been investigated. To address this, we compared the performance of six variant callers (VarScan, iVar, GATK, FreeBayes, LoFreq and BCFtools), used widely in bioinformatics pipelines, on 19 synthetic samples with known ratios of three different SARS-CoV-2 variants of concern (VOCs) (Alpha, Beta and Delta), as well as 13 wastewater samples collected in London between the 15th and 18th December 2021. We used the fundamental parameters of recall (sensitivity) and precision (specificity) to confirm the presence of mutational profiles defining specific variants across the six variant callers. Our results show that BCFtools, FreeBayes and VarScan found the expected variants with higher precision and recall than GATK or iVar, although the latter identified more expected defining mutations than other callers. LoFreq gave the least reliable results due to the high number of false-positive mutations detected, resulting in lower precision. Similar results were obtained for both the synthetic and wastewater samples.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Vigilância Epidemiológica Baseada em Águas Residuárias , Águas Residuárias , Algoritmos
4.
Front Genet ; 14: 1138582, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37051600

RESUMO

The ongoing SARS-CoV-2 pandemic demonstrates the utility of real-time sequence analysis in monitoring and surveillance of pathogens. However, cost-effective sequencing requires that samples be PCR amplified and multiplexed via barcoding onto a single flow cell, resulting in challenges with maximising and balancing coverage for each sample. To address this, we developed a real-time analysis pipeline to maximise flow cell performance and optimise sequencing time and costs for any amplicon based sequencing. We extended our nanopore analysis platform MinoTour to incorporate ARTIC network bioinformatics analysis pipelines. MinoTour predicts which samples will reach sufficient coverage for downstream analysis and runs the ARTIC networks Medaka pipeline once sufficient coverage has been reached. We show that stopping a viral sequencing run earlier, at the point that sufficient data has become available, has no negative effect on subsequent down-stream analysis. A separate tool, SwordFish, is used to automate adaptive sampling on Nanopore sequencers during the sequencing run. This enables normalisation of coverage both within (amplicons) and between samples (barcodes) on barcoded sequencing runs. We show that this process enriches under-represented samples and amplicons in a library as well as reducing the time taken to obtain complete genomes without affecting the consensus sequence.

5.
Genome Biol Evol ; 15(1)2023 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-36542479

RESUMO

Koala populations show marked differences in inbreeding levels and in the presence or absence of the endogenous Koala retrovirus (KoRV). These genetic differences among populations may lead to severe disease impacts threatening koala population viability. In addition, the recent colonization of the koala genome by KoRV provides a unique opportunity to study the process of retroviral adaptation to vertebrate genomes and the impact this has on speciation, genome structure, and function. The genome build described here is from an animal from the bottlenecked Southern population free of endogenous and exogenous KoRV. It provides a more contiguous genome build than the previous koala reference derived from an animal from a more outbred Northern population and is the first koala genome from a KoRV polymerase-free animal.


Assuntos
Retrovirus Endógenos , Gammaretrovirus , Phascolarctidae , Infecções por Retroviridae , Animais , Phascolarctidae/genética , Austrália/epidemiologia , Infecções por Retroviridae/epidemiologia , Infecções por Retroviridae/genética , Retroviridae/genética , Gammaretrovirus/genética
6.
Commun Biol ; 5(1): 929, 2022 09 08.
Artigo em Inglês | MEDLINE | ID: mdl-36075960

RESUMO

The underlying mechanisms driving paternally-programmed metabolic disease in offspring remain poorly defined. We fed male C57BL/6 mice either a control normal protein diet (NPD; 18% protein) or an isocaloric low protein diet (LPD; 9% protein) for a minimum of 8 weeks. Using artificial insemination, in combination with vasectomised male mating, we generated offspring using either NPD or LPD sperm but in the presence of NPD or LPD seminal plasma. Offspring from either LPD sperm or seminal fluid display elevated body weight and tissue dyslipidaemia from just 3 weeks of age. These changes become more pronounced in adulthood, occurring in conjunction with altered hepatic metabolic and inflammatory pathway gene expression. Second generation offspring also display differential tissue lipid abundance, with profiles similar to those of first generation adults. These findings demonstrate that offspring metabolic homeostasis is perturbed in response to a suboptimal paternal diet with the effects still evident within a second generation.


Assuntos
Dieta com Restrição de Proteínas , Sêmen , Animais , Pai , Homeostase , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL
7.
Front Cell Infect Microbiol ; 12: 841138, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35531335

RESUMO

A sexual cycle was described in 2009 for the opportunistic fungal pathogen Aspergillus fumigatus, opening up for the first time the possibility of using techniques reliant on sexual crossing for genetic analysis. The present study was undertaken to evaluate whether the technique 'bulk segregant analysis' (BSA), which involves detection of differences between pools of progeny varying in a particular trait, could be applied in conjunction with next-generation sequencing to investigate the underlying basis of monogenic traits in A. fumigatus. Resistance to the azole antifungal itraconazole was chosen as a model, with a dedicated bioinformatic pipeline developed to allow identification of SNPs that differed between the resistant progeny pool and resistant parent compared to the sensitive progeny pool and parent. A clinical isolate exhibiting monogenic resistance to itraconazole of unknown basis was crossed to a sensitive parent and F1 progeny used in BSA. In addition, the use of backcrossing and increasing the number in progeny pools was evaluated as ways to enhance the efficiency of BSA. Use of F1 pools of 40 progeny led to the identification of 123 candidate genes with SNPs distributed over several contigs when aligned to an A1163 reference genome. Successive rounds of backcrossing enhanced the ability to identify specific genes and a genomic region, with BSA of progeny (using 40 per pool) from a third backcross identifying 46 genes with SNPs, and BSA of progeny from a sixth backcross identifying 20 genes with SNPs in a single 292 kb region of the genome. The use of an increased number of 80 progeny per pool also increased the resolution of BSA, with 29 genes demonstrating SNPs between the different sensitive and resistant groupings detected using progeny from just the second backcross with the majority of variants located on the same 292 kb region. Further bioinformatic analysis of the 292 kb region identified the presence of a cyp51A gene variant resulting in a methionine to lysine (M220K) change in the CYP51A protein, which was concluded to be the causal basis of the observed resistance to itraconazole. The future use of BSA in genetic analysis of A. fumigatus is discussed.


Assuntos
Aspergillus fumigatus , Azóis , Antifúngicos/farmacologia , Aspergillus fumigatus/metabolismo , Azóis/farmacologia , Farmacorresistência Fúngica/genética , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Itraconazol/metabolismo , Itraconazol/farmacologia , Testes de Sensibilidade Microbiana
9.
Genome Biol ; 23(1): 54, 2022 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-35164830

RESUMO

BACKGROUND: Ribosomal DNA (rDNA) displays substantial inter-individual genetic variation in human and mouse. A systematic analysis of how this variation impacts epigenetic states and expression of the rDNA has thus far not been performed. RESULTS: Using a combination of long- and short-read sequencing, we establish that 45S rDNA units in the C57BL/6J mouse strain exist as distinct genetic haplotypes that influence the epigenetic state and transcriptional output of any given unit. DNA methylation dynamics at these haplotypes are dichotomous and life-stage specific: at one haplotype, the DNA methylation state is sensitive to the in utero environment, but refractory to post-weaning influences, whereas other haplotypes entropically gain DNA methylation during aging only. On the other hand, individual rDNA units in human show limited evidence of genetic haplotypes, and hence little discernible correlation between genetic and epigenetic states. However, in both species, adjacent units show similar epigenetic profiles, and the overall epigenetic state at rDNA is strongly positively correlated with the total rDNA copy number. Analysis of different mouse inbred strains reveals that in some strains, such as 129S1/SvImJ, the rDNA copy number is only approximately 150 copies per diploid genome and DNA methylation levels are < 5%. CONCLUSIONS: Our work demonstrates that rDNA-associated genetic variation has a considerable influence on rDNA epigenetic state and consequently rRNA expression outcomes. In the future, it will be important to consider the impact of inter-individual rDNA (epi)genetic variation on mammalian phenotypes and diseases.


Assuntos
Metilação de DNA , RNA Ribossômico , Animais , DNA Ribossômico/genética , Epigênese Genética , Variação Genética , Humanos , Mamíferos/genética , Camundongos , Camundongos Endogâmicos C57BL , RNA Ribossômico/genética , RNA Ribossômico/metabolismo
10.
Bioinformatics ; 38(4): 1133-1135, 2022 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-34791062

RESUMO

SUMMARY: minoTour offers a Laboratory Informations Management System (LIMS) system for Oxford Nanopore Technology sequencers, with real-time metrics and analysis available permanently for review. Integration of unique real-time automated analysis can reduce the time required to answer biological questions, including mapping and classification of sequence while a run is in progress. Real-time sequence data require new methods of analysis which do not wait for the completion of a run and minoTour provides a framework to allow users to exploit these features. AVAILABILITY AND IMPLEMENTATION: Source code and documentation are available at https://github.com/LooseLab/minotourcli and https://github.com/LooseLab/minotourapp. Docker images are available from https://hub.docker.com/r/adoni5/, and can be installed using a preconfigured docker-compose script at https://github.com/LooseLab/minotour-docker. An example server is available at http://137.44.59.170. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Nanoporos , Software
11.
J Infect Dis ; 225(1): 10-18, 2022 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-34555152

RESUMO

Nosocomial severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections have severely affected bed capacity and patient flow. We utilized whole-genome sequencing (WGS) to identify outbreaks and focus infection control resources and intervention during the United Kingdom's second pandemic wave in late 2020. Phylogenetic analysis of WGS and epidemiological data pinpointed an initial transmission event to an admission ward, with immediate prior community infection linkage documented. High incidence of asymptomatic staff infection with genetically identical viral sequences was also observed, which may have contributed to the propagation of the outbreak. WGS allowed timely nosocomial transmission intervention measures, including admissions ward point-of-care testing and introduction of portable HEPA14 filters. Conversely, WGS excluded nosocomial transmission in 2 instances with temporospatial linkage, conserving time and resources. In summary, WGS significantly enhanced understanding of SARS-CoV-2 clusters in a hospital setting, both identifying high-risk areas and conversely validating existing control measures in other units, maintaining clinical service overall.


Assuntos
COVID-19 , Infecção Hospitalar , Surtos de Doenças/prevenção & controle , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , Sequenciamento Completo do Genoma , Infecções Assintomáticas , Infecção Hospitalar/epidemiologia , Atenção à Saúde , Pessoal de Saúde , Humanos , Equipamento de Proteção Individual , Filogenia , SARS-CoV-2
12.
Wellcome Open Res ; 6: 112, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34671705

RESUMO

We present a genome assembly from an individual female Aquila chrysaetos chrysaetos (the European golden eagle; Chordata; Aves; Accipitridae). The genome sequence is 1.23 gigabases in span. The majority of the assembly is scaffolded into 28 chromosomal pseudomolecules, including the W and Z sex chromosomes.

14.
J Gen Virol ; 102(6)2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34130773

RESUMO

In the early phases of the SARS coronavirus type 2 (SARS-CoV-2) pandemic, testing focused on individuals fitting a strict case definition involving a limited set of symptoms together with an identified epidemiological risk, such as contact with an infected individual or travel to a high-risk area. To assess whether this impaired our ability to detect and control early introductions of the virus into the UK, we PCR-tested archival specimens collected on admission to a large UK teaching hospital who retrospectively were identified as having a clinical presentation compatible with COVID-19. In addition, we screened available archival specimens submitted for respiratory virus diagnosis, and dating back to early January 2020, for the presence of SARS-CoV-2 RNA. Our data provides evidence for widespread community circulation of SARS-CoV-2 in early February 2020 and into March that was undetected at the time due to restrictive case definitions informing testing policy. Genome sequence data showed that many of these early cases were infected with a distinct lineage of the virus. Sequences obtained from the first officially recorded case in Nottinghamshire - a traveller returning from Daegu, South Korea - also clustered with these early UK sequences suggesting acquisition of the virus occurred in the UK and not Daegu. Analysis of a larger sample of sequences obtained in the Nottinghamshire area revealed multiple viral introductions, mainly in late February and through March. These data highlight the importance of timely and extensive community testing to prevent future widespread transmission of the virus.


Assuntos
COVID-19/diagnóstico , COVID-19/virologia , Sistema Respiratório/virologia , SARS-CoV-2/isolamento & purificação , Adulto , Idoso , COVID-19/epidemiologia , COVID-19/transmissão , Teste de Ácido Nucleico para COVID-19 , Feminino , Humanos , Masculino , Programas de Rastreamento/métodos , Pessoa de Meia-Idade , Filogenia , RNA Viral/genética , Estudos Retrospectivos , SARS-CoV-2/genética , Reino Unido/epidemiologia
15.
Nat Biotechnol ; 39(4): 442-450, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33257864

RESUMO

Nanopore sequencers can be used to selectively sequence certain DNA molecules in a pool by reversing the voltage across individual nanopores to reject specific sequences, enabling enrichment and depletion to address biological questions. Previously, we achieved this using dynamic time warping to map the signal to a reference genome, but the method required substantial computational resources and did not scale to gigabase-sized references. Here we overcome this limitation by using graphical processing unit (GPU) base-calling. We show enrichment of specific chromosomes from the human genome and of low-abundance organisms in mixed populations without a priori knowledge of sample composition. Finally, we enrich targeted panels comprising 25,600 exons from 10,000 human genes and 717 genes implicated in cancer, identifying PML-RARA fusions in the NB4 cell line in <15 h sequencing. These methods can be used to efficiently screen any target panel of genes without specialized sample preparation using any computer and a suitable GPU. Our toolkit, readfish, is available at https://www.github.com/looselab/readfish .


Assuntos
Biologia Computacional/métodos , Sequenciamento por Nanoporos/instrumentação , Neoplasias/genética , Proteínas de Fusão Oncogênica/genética , Linhagem Celular Tumoral , Éxons , Tamanho do Genoma , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Software
16.
PLoS One ; 15(12): e0244255, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33332446

RESUMO

Reactive oxygen species are bona fide intracellular second messengers that influence cell metabolism and aging by mechanisms that are incompletely resolved. Mitochondria generate superoxide that is dis-mutated to hydrogen peroxide, which in turn oxidises cysteine-based enzymes such as phosphatases, peroxiredoxins and redox-sensitive transcription factors to modulate their activity. Signal Transducer and Activator of Transcription 3 (Stat3) has been shown to participate in an oxidative relay with peroxiredoxin II but the impact of Stat3 oxidation on target gene expression and its biological consequences remain to be established. Thus, we created murine embryonic fibroblasts (MEFs) that express either WT-Stat3 or a redox-insensitive mutant of Stat3 (Stat3-C3S). The Stat3-C3S cells differed from WT-Stat3 cells in morphology, proliferation and resistance to oxidative stress; in response to cytokine stimulation, they displayed elevated Stat3 tyrosine phosphorylation and Socs3 expression, implying that Stat3-C3S is insensitive to oxidative inhibition. Comparative analysis of global gene expression in WT-Stat3 and Stat3-C3S cells revealed differential expression (DE) of genes both under basal conditions and during oxidative stress. Using differential gene regulation pattern analysis, we identified 199 genes clustered into 10 distinct patterns that were selectively responsive to Stat3 oxidation. GO term analysis identified down-regulated genes to be enriched for tissue/organ development and morphogenesis and up-regulated genes to be enriched for cell-cell adhesion, immune responses and transport related processes. Although most DE gene promoters contain consensus Stat3 inducible elements (SIEs), our chromatin immunoprecipitation (ChIP) and ChIP-seq analyses did not detect Stat3 binding at these sites in control or oxidant-stimulated cells, suggesting that oxidised Stat3 regulates these genes indirectly. Our further computational analysis revealed enrichment of hypoxia response elements (HREs) within DE gene promoters, implying a role for Hif-1. Experimental validation revealed that efficient stabilisation of Hif-1α in response to oxidative stress or hypoxia required an oxidation-competent Stat3 and that depletion of Hif-1α suppressed the inducible expression of Kcnb1, a representative DE gene. Our data suggest that Stat3 and Hif-1α cooperate to regulate genes involved in immune functions and developmental processes in response to oxidative stress.


Assuntos
Regulação da Expressão Gênica no Desenvolvimento , Subunidade alfa do Fator 1 Induzível por Hipóxia/metabolismo , Estresse Oxidativo , Regiões Promotoras Genéticas , Elementos de Resposta , Fator de Transcrição STAT3/química , Fator de Transcrição STAT3/fisiologia , Animais , Fibroblastos/citologia , Fibroblastos/metabolismo , Subunidade alfa do Fator 1 Induzível por Hipóxia/genética , Camundongos , Camundongos Knockout , Transdução de Sinais , Ativação Transcricional
17.
Nature ; 585(7823): 79-84, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32663838

RESUMO

After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.


Assuntos
Cromossomos Humanos X/genética , Genoma Humano/genética , Telômero/genética , Centrômero/genética , Ilhas de CpG/genética , Metilação de DNA , DNA Satélite/genética , Feminino , Humanos , Mola Hidatiforme/genética , Masculino , Gravidez , Reprodutibilidade dos Testes , Testículo/metabolismo
19.
Nat Methods ; 16(12): 1297-1305, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31740818

RESUMO

High-throughput complementary DNA sequencing technologies have advanced our understanding of transcriptome complexity and regulation. However, these methods lose information contained in biological RNA because the copied reads are often short and modifications are not retained. We address these limitations using a native poly(A) RNA sequencing strategy developed by Oxford Nanopore Technologies. Our study generated 9.9 million aligned sequence reads for the human cell line GM12878, using thirty MinION flow cells at six institutions. These native RNA reads had a median length of 771 bases, and a maximum aligned length of over 21,000 bases. Mitochondrial poly(A) reads provided an internal measure of read-length quality. We combined these long nanopore reads with higher accuracy short-reads and annotated GM12878 promoter regions to identify 33,984 plausible RNA isoforms. We describe strategies for assessing 3' poly(A) tail length, base modifications and transcript haplotypes.


Assuntos
Sequenciamento por Nanoporos/métodos , Poli A/genética , Análise de Sequência de RNA/métodos , Transcriptoma , Células Cultivadas , Humanos
20.
Bioinformatics ; 35(13): 2193-2198, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30462145

RESUMO

MOTIVATION: The Oxford Nanopore Technologies (ONT) MinION is used for sequencing a wide variety of sample types with diverse methods of sample extraction. Nanopore sequencers output FAST5 files containing signal data subsequently base called to FASTQ format. Optionally, ONT devices can collect data from all sequencing channels simultaneously in a bulk FAST5 file enabling inspection of signal in any channel at any point. We sought to visualize this signal to inspect challenging or difficult to sequence samples. RESULTS: The BulkVis tool can load a bulk FAST5 file and overlays MinKNOW (the software that controls ONT sequencers) classifications on the signal trace and can show mappings to a reference. Users can navigate to a channel and time or, given a FASTQ header from a read, jump to its specific position. BulkVis can export regions as Nanopore base caller compatible reads. Using BulkVis, we find long reads can be incorrectly divided by MinKNOW resulting in single DNA molecules being split into two or more reads. The longest seen to date is 2 272 580 bases in length and reported in eleven consecutive reads. We provide helper scripts that identify and reconstruct split reads given a sequencing summary file and alignment to a reference. We note that incorrect read splitting appears to vary according to input sample type and is more common in 'ultra-long' read preparations. AVAILABILITY AND IMPLEMENTATION: The software is available freely under an MIT license at https://github.com/LooseLab/bulkvis. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Nanoporos , DNA , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...